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Genomics and the Irreducible Nature 


of Eukaryote Cells 


C. G. Kurland,? L. J. Collins,? D. Penny2* 


Large-scale comparative genomics in harness with proteomics has substantiated fundamental 
features of eukaryote cellular evolution. The evolutionary trajectory of modern eukaryotes is 
distinct from that of prokaryotes. Data from many sources give no direct evidence that eukaryotes 
evolved by genome fusion between archaea and bacteria. Comparative genomics shows that, under 
certain ecological settings, sequence loss and cellular simplification are common modes of 
evolution. Subcellular architecture of eukaryote cells is in part a physical-chemical consequence of 
molecular crowding; subcellular compartmentation with specialized proteomes is required for the 


efficient functioning of proteins. 


omparative genomics and proteomics 
( have strengthened the view that modern 

eukaryote and prokaryote cells have long 
followed separate evolutionary trajectories. Be- 
cause their cells appear simpler, prokaryotes 
have traditionally been considered ancestors of 
eukaryotes (/—4). Nevertheless, comparative 
genomics has confirmed a lesson from paleon- 
tology: Evolution does not proceed monoton- 
ically from the simpler to the more complex 
(5-9). Here, we review recent data from pro- 
teomics and genome sequences suggesting that 
eukaryotes are a unique primordial lineage. 

Mitochondria, mitosomes, and hydrogeno- 
somes are a related family of organelles that 
distinguish eukaryotes from all prokaryotes 
(10). Recent analyses also suggest that early 
eukaryotes had many introns (77, /2), and RNAs 
and proteins found in modern spliceosomes 
(13). Indeed, it seems that life-history param- 
eters affect intron numbers (/4, 15). In addition, 
“molecular crowding” is now recognized as an 
important physical-chemical factor contributing 
to the compartmentation of even the earliest 
eukaryote cells (16, 17). 

Nuclei, nucleoli, Golgi apparatus, centrioles, 
and endoplasmic reticulum are examples of 
cellular signature structures (CSSs) that dis- 
tinguish eukaryote cells from archaea and bacte- 
ria. Comparative genomics, aided by proteomics 
of CSSs such as the mitochondria (/8, 19), 
nucleoli (20, 21), and spliceosomes (13, 22), 
reveals hundreds of proteins with no orthologs 
evident in the genomes of prokaryotes; these 
are the eukaryotic signature proteins (ESPs) 
(23, 24). The many ESPs within the subcel- 
lular structures of eukaryote cells provide 
landmarks to track the trajectory of eukary- 
ote genomes from their origins. In contrast, 
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hypotheses that attribute eukaryote origins to 
genome fusion between archaea and bacteria 
(25-30) are surprisingly uninformative about 
the emergence of the cellular and genomic sig- 
natures of eukaryotes (CSSs and ESPs). The 
failure of genome fusion to directly explain any 
characteristic feature of the eukaryote cell is a 
critical starting point for studying eukaryote 
origins. 
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Fig. 1. The common ancestor of eukaryotes, 
bacteria, and archaea may have been a community 
of organisms containing the following: autotrophs 
that produced organic compounds from CO, either 
photosynthetically or by inorganic chemical reac- 
tions; heterotrophs that obtained organics by 
leakage from other organisms; saprotrophs that 
absorbed nutrients from decaying organisms; and 
phagotrophs that were sufficiently complex to 
envelop and digest prey. +M: endosymbiosis of 
mitochondrial ancestor. 


It is agreed that, whether using gene con- 
tent, protein-fold families, or RNA sequences 
(31-36), the unrooted tree of life divides into 
archaea, bacteria, and eukaryotes (Fig. 1). On 
such unrooted trees, the three domains diverge 
from a population that can be called the last 
universal common ancestor (LUCA). How- 
ever, LUCA (37) means different things to 
different people, so we prefer to call it a com- 
mon ancestor; in this case it is the hypothetical 


D 
D 


A 


node at which the three domains coalesce in 
unrooted trees. 

There are links between comparative ge- 
nomics and the ecology of organisms. These 
include the aerobic/anaerobic states of the 
environment and the adaptive fit of organelles 
such as mitochondria, hydrogenosomes, and 
mitosomes (70, 18, 19, 38—41). In addition to 
the advantages from oxidative metabolism and/ 
or oxygen detoxification, other advantages must 
have accrued from having a cellular compart- 
ment with dense proteomes (15, 38, 42). Eco- 
logical specialization can account for the 
differences between prokaryote and eukaryote 
cell architectures and genome sizes. Small pro- 
karyote cells with streamlined genomes may 
reflect adaptation to rapid growth and/or mini- 
mal resource use by autotrophs, heterotrophs, and 
saprotrophs. Divergent evolutionary paths may 
emerge with the adoption of a phagotrophic- 
feeding mode in an ancestor of eukaryotes. This 
uniquely eukaryote feeding mode requires a 
larger and more complex cell, consistent with 
earlier suggestions that a unicellular raptor 
(predator), which acquired a bacterial endo- 
symbiont/mitochondria lineage, became the 
common ancestor of all modern eukaryotes 
(3, 4, 43). Indeed, predator/prey relationships 
may provide the ecological setting for the 
divergence of the distinctive cell types adopted 
by eukaryotes, bacteria, and archaea. 


Proteomics of Cell Compartments 


Comparative genomics and proteomics reveal 
phylogenetic relationships between proteins 
making up eukaryote subcellular features and 
those found in prokaryotes. We distinguish three 
main phylogenetic classes; the first are proteins 
that are unique to eukaryotes: the ESPs. The 
ESPs we place in three subclasses: proteins 
arising de novo in eukaryotes; proteins so 
divergent to homologs of other domains that 
their relationship is largely lost; or finally, 
descendants of proteins that are lost from other 
domains, surviving only as ESPs in eukaryotes. 

The second class contains interdomain 
horizontal gene transfers; these are proteins 
occurring in two domains with the lineage of 
one domain rooted within their homologs in a 
second domain (44). The third class contains 
homologs found in at least two domains, but 
the proteins of one domain are not rooted 
within another domain(s); instead, the homo- 
logs appear to descend from the common an- 
cestor (Fig. 1). Most eukaryote proteins shared 
by prokaryotes are distant, rather than close, 
relatives. Thus, proteins shared between do- 
mains appear to be descendants of the common 
ancestor; few seem to result from interdomain 
lateral gene transfer (31—35). 

Although the genomes of mitochondria are 
clearly descendants of a-proteobacteria (45, 46), 
proteomics and comparative genomics identify 
relatively few proteins in yeast and human 
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mitochondria descended from the ancestral 
bacterium (17, 18, 36, 47). Several hundred 
genes have been transferred from the ancestral 
bacterium to the nuclear genome, but most 
proteins from the original endosymbiont have 
been lost. For yeast, the largest protein class 
contains more than 200 eukaryote proteins 
(ESPs) targeted to the mitochondrion but en- 
coded in the nucleus. In addition, the yeast 
nucleus encodes 150 mitochondrial proteins not 
uniquely identifiable with a single domain but 
apparently eukaryotic descendants from the com- 
mon ancestor. Accordingly, the yeast and human 
mitochondria proteomes emerge largely as 
products of the eukaryotic nuclear genome 
(85%) and only to a lesser degree (15%) as direct 
descendants of endosymbionts (17, 18, 36, 45). 
The strong representation 
of ESPs in their prote- 
omes means that mitochon- 
dria and their descendants 
are usefully viewed as “hon- 
orary” CSSs. 

There are substantial 
numbers of ESPs in the 
other CSSs. For the pro- 
teome of the reduced an- 
aerobic parasite Giardia 
lamblia (23), searches of 
2136 proteins found in 
each of Saccharomyces 
cerevisiae, Drosophila 
melanogaster, Caenorhab- 
ditis elegans, and Arabi- 
dopsis thaliana yielded 
347 ESPs for G. lamblia. 
This was reduced to rough- 
ly 300 by rigorous screen- 
ing, with ESPs distributed 
between nuclear and cy- 
toplasmic compartments 
(Fig. 2) (48). The ubiquity 
of the ESPs and the ab- 
sence of archaeal de- 
scendants are not easily 
explained by a prokary- 
ote genome fusion model 
(49). The simplest inter- 
pretation is that the host for the endosymbiont/ 
mitochondrial lineage was an ancestral eukaryote. 

Similar results are obtained for another 
reduced eukaryote, the intracellular parasite 
Encephalitozoon cuniculi. A recent study (24) 
identified 401 ESPs, of which 295 had homo- 
logs among the ESPs of G. lamblia (23). Two 
major categories of ESPs in the G. lamblia and 
E. cuniculi genomes were distinguished: those 
associated with the CSSs (Fig. 2) and those 
involved in control functions such as guanosine 
triphosphate (GTP) binding proteins, kinases, 
and phosphatases (7). It was also observed (23) 
that many characteristic eukaryotic proteins with 
weak sequence homology to prokaryotic proteins 
but more convincing homologies of structural 
fold such as the actins, tubulins, kinesins, 


ubiquitins, and some GTP binding proteins are 
among the most highly conserved eukaryotic 
proteins. These may be descendants of the com- 
mon ancestor recruited early in the evolution of 
the eukaryotic nuclear genome. 

Nucleolar proteomes (20, 2/) are examples 
of essential eukaryote compartments not wrapped 
in double membranes and where there is no 
suspicion of an endosymbiotic origin. From 271 
proteins in the human nucleolar proteome, 206 
protein folds were identified and classified phy- 
logenetically (20, 2/). Of these, 109 are eukary- 
otic signature folds, and the remaining ones 
appear to be descendants of the common an- 
cestor, occurring in two or three domains. 

The spliceosome is a unique molecular 
machine that removes introns from eukaryote 
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Fig. 2. Distribution of ESPs in the proteome of G. lamblia. ESPs (23) were matched to the human 
International Protein Index data set (48) and then assigned to individual CSSs based on their gene 
ontology annotations. A protein may be present in more than one CSS (e.g., a protein involved in 
transport from the nucleus to the cytoplasm will be assigned to both CSSs). Black numbers are the 
number of proteins assigned to each CSS from the total G. lamblia proteome (AACB00000000) 
(3077 ORFs matched and linked to gene ontology); red numbers are the ESPs assigned to each CSS 
(320 proteins matched and linked to gene ontology). 


mRNAs (22). Even though we do not know the 
ancestral processing signals for the earliest 
eukaryotes (50), roughly half of the 78 spliceo- 
somal proteins likely to be present in the an- 
cestral spliceosome are ESPs, (73) whereas 
the other half containing the Sm/LSm proteins 
(51) have homologs in bacteria and archaea 
(13). These distributions of both ESPs as well 
as of putative descendants of the common an- 
cestor suggest that many components of mod- 
ern spliceosomes were present in the common 
ancestor (52). 

The subdivision into subcellular compart- 
ments (CSSs) with characteristic proteomes re- 
stricts proteins to volumes considerably smaller 
than the whole cell. Concentrations of macro- 
molecules in cells are very high, typically be- 


tween 20 and 30% of weight or volume (53). 
Such densities are described as “molecular 
crowding” because the space between macro- 
molecules is much less than their diameters; 
consequently, diffusion of proteins in cells is 
retarded (54). Molecular crowding favors mac- 
romolecular associations, large complexes, and 
networks of proteins that support biological 
functions (16, 17, 53). 

High densities enhance the association ki- 
netics of small molecules with proteins because 
the excluded volumes of the proteins reduce the 
effective volume through which small molecules 
diffuse (55). The sum of these effects is that the 
high macromolecular densities within CSSs en- 
hance the kinetic efficiencies of proteins. The 
same principles apply to the smaller prokary- 
otic cells, but the effects 
are accentuated in larger 
cells. Subdividing high 
densities of proteins in- 
to more or less distinct 
compartments contain- 
ing functionally interac- 
tive macromolecules is 
expected to be an early 
feature of the eukary- 
ote lineage. The distinc- 
tive proteome of nucleoli 
demonstrates that com- 
partmentation does not 
require an enclosing mem- 
brane. Furthermore, cell 
fusion is not required to 
account for, nor does it 
explain (49), the large 
number of eukaryote cell 
compartments. 


Selection Gives and 
Selection Takes 


Genomes evolve continu- 
ously through the interplay 
of unceasing mutation, 
unremitting competition, 
and ever-changing envi- 
ronments. Both sequence 
loss and sequence gain 
can result. In general, expanded genome size, 
along with augmented gene expression, increases 
the costs of cell propagation so the evolution of 
larger genomes and larger cells requires gains in 
fitness that compensate (15, 56, 57). Conversely, 
genome reduction is expected to lower the costs 
of propagation. There is an ever-present poten- 
tial to improve the efficiency of cell propaga- 
tion by reductive evolution. 

Environmental shifts may neutralize se- 
quences, leaving no selective pressure to main- 
tain them against the persistent flux of deleterious 
mutations. Such neutralized sequences eventu- 
ally and inevitably disappear because of ““mu- 
tational meltdown” (14, 15, 56, 57). Genome 
reduction can be achieved through differential 
loss of coding and noncoding sequences (com- 
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paction) (57). Theileria has evolved through 
gene loss as well as compaction of its intergenic 
spaces, whereas Paramecium has eliminated 
only a small length of genes but markedly re- 
duced the number of its introns (57). The com- 
plex genomes of some vertebrates (pufferfish, 
Takifugu) are so highly compacted that their 
genome lengths are reduced to one-eighth 
that of other vertebrates (58). Extreme cellular 
simplification is observed among anaerobic 
protists, including simplification of CSSs such as 
mitochondria and the Golgi apparatus (59-64). S. 
cerevisiae, which underwent a whole-genome 
duplication, subsequently purged ~85% of the 
duplicated sequences (65, 66). The evolution 
of genome content is clearly not monotonic 
(Fig. 3) (67, 68). Genome sizes on the branches 
of a phylogenetic tree of fungi show irregular 
genome enlargement (including du- 
plication) and reduction. Examples 
of ecological circumstances driving 
genome reduction are seen in many 
intracellular endosymbionts and par- 
asites, which gain few genes but lose 
many genes responsible for metabolic 
flexibility (6-8, 69). 

The mitochondrion is even more 
extreme in its reductive evolution; 
its ancestral bacterial genome has 
been reduced to a vestigial micro- 
genome supported by a predomi- 
nantly eukaryote proteome (/8, 19). 
Genomes of modern mitochondria 
encode between 3 and 67 proteins 
(44), whereas the smallest known 
free-living o-proteobacterium (Bar- 
tonella quintana) encodes ~1100 
proteins (70). Taking Bartonella 
as a minimal genome for the free- 
living ancestor of mitochondria, 
nearly all of the bacterial coding 
sequences have been lost from the 
organelle, though not necessarily 
from the eukaryote cell. The mito- 
chondrial genome of the protist Reclinomonas 
americana is the largest known but has still 
lost more than 95% of its original coding 
capacity. 

This abbreviated account of genome reduction 
illustrates the Darwinian view of evolution as a 
reversible process in the sense that “eyes can be 
acquired and eyes can be lost.” Genome evolu- 
tion is a two-way street. This bidirectional sense 
of reversibility is important as an alternative to 
the view of evolution as a rigidly monotonic 
progression from simple to more complex 
states, a view with roots in the 18th-century 
theory of orthogenesis (7/). Unfortunately, 
such a model has been tacitly favored by 
molecular biologists who appeared to view 
evolution as an irreversible march from sim- 
ple prokaryotes to complex eukaryotes, from 
unicellular to multicellular. The many well- 
documented instances of genome reduction 
provide a necessary corrective measure to the 
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often-unstated assumption that eukaryotes must 
have originated from prokaryotes. 


The Hunt for the Phagotrophic 
Unicellular Raptor 


Proteomics, together with comparative ge- 
nomics, allows glimpses of the cell structure 
of eukaryote ancestors. They are likely to have 
had introns as well as the complex machinery 
for removing them, and much of that RNA 
processing machinery still exists in their de- 
scendants (73, 22, 51). Because of molecular 
crowding, it is expected that interacting 
proteins would tend to accumulate in function- 
al domains, making rudimentary CSSs early 
features of the large-celled eukaryotes. We 
cannot say whether there was a substantial 
period of time after the emergence of cells 
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Fig. 3. Genome sizes (in megabases) can increase and decrease in 
lineages because of events such as genome duplication and reductive 
evolution, as illustrated in this fungal phylogeny [adapted from (67, 68)]. 
Genome sizes were obtained from the National Center for Biotechnology 
Information (NCBI) Genome biology (www.ncbi.nih.gov/Genomes/) 
database. GD, genome duplication; RE, reductive evolution. 


when there were no unicellular raptors or 
predators—a Garden of Eden. However, the 
identification among prokaryotes of orthologs 
with structural affinities to actins, tubulins, 
kinesins, and ubiquitins (72, 73) is consistent 
with some early organisms having evolved a 
phagotrophic life-style. This echoes a recurrent 
theme (3, 4, 43) in which it was supposed that 
the earliest eukaryotes could feed as unicel- 
lular “raptors.” 

We expect that the earliest organisms 
were primarily auxotrophs, heterotrophs, and 
saprotrophs—an excellent community to sup- 
port raptors. Phagotrophy is a hallmark of eu- 
karyotic cells and is unknown among modern 
prokaryotes, and so it is natural to reconsider 
this feeding mode as a defining feature of an- 
cestral eukaryotes. Cavalier-Smith (43) sug- 
gested that the ancestors of eukaryotes were 
phagotrophic, anaerobic free-living protists, 
called archeozoa. He also identified present- 
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day anaerobic parasites such as Entamoeba, 
Giardia, and Microsporidia as archeozoa. How- 
ever, these organisms are descendants of aer- 
obic, mitochondriate eukaryotes (70). Genome 
reduction and cellular simplification are hall- 
marks of parasites and symbionts (6-8, 46, 69). 
Indeed, most of the eukaryotic anaerobes 
studied so far are parasites or symbionts of 
multicellular creatures. 

For the reasons outlined above, we favor 
the idea (3, 4) that the host that acquired the 
mitochondrial endosymbiont was a unicel- 
lular eukaryote predator, a raptor. The emer- 
gence of unicellular raptors would have had a 
major ecological impact on the evolution of 
the gentler descendants of the common an- 
cestor. These may have responded with sev- 
eral adaptive strategies: They might outproduce 

the raptors by rapid growth or hide 
from raptors by adapting to ex- 
treme environments. Thus, the hy- 
pothetical eukaryote raptors may 
have driven the evolution of their 
autotrophic, heterotrophic, and sapro- 
trophic cousins in a reductive mode 
that put a premium on the relatively 
fast-growing, streamlined cell types 
we call prokaryotes (74). 


Concluding Remarks 


Genomics and proteomics have great- 
ly increased our awareness of the 
uniqueness of eukaryote cells. This, 
together with increased understand- 
ing of molecular crowding, as well as 
the dynamic, often reductive nature of 
genome evolution, offers a new view 
of the origin of eukaryote cells. The 
eukaryotic CSSs define a unique 
cell type that cannot be deconstructed 
into features inherited directly from 
archaea and bacteria. Only a small 
fraction (~ 15%) of a-proteobacterial 
proteins are identified in the yeast 
and human mitochondrial proteomes; none 
seem to be direct descendants of archaea, and 
roughly half seem to be exclusively eukaryotic 
(18, 19, 38, 47). The identification of the a- 
proteobacterial descendants in this proteome 
validates the phylogenetic distinction between 
direct descent from genes transferred to the host 
from the bacterial endosymbiont, as opposed to 
descent from a hypothetical common ancestor. 

ESPs are important markers of the novel 
evolutionary trajectory of modern eukaryotes. 
In contrast, most proteins occur in more than 
one domain (3/—36), and most of these could 
derive from the common ancestor. We take the 
relative abundance of signature proteins among 
eukaryotes to indicate that their genomes typ- 
ically have a greater coding capacity than those 
of prokaryotes. It remains to be seen which 
ESPs have been lost from prokaryotes and 
which have been acquired by eukaryotes during 
their evolution. 
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The hypothetical fusion of an archaeon and 
a bacterium explains nothing about the special 
features of the modern eukaryote cell (49), nor 
the many signature proteins. Nothing in global 
phylogenies based on ribosomal RNA, pooled 
proteins, and protein-fold families indicates that 
genome fusion generated the eukaryote lineage. 
Perhaps interest in fusion models arose because 
BLAST searches suggest that different eukary- 
otic coding sequences are sometimes more 
closely related to archaeal homologs and other 
times more closely related to bacterial homo- 
logs (49). These weak domain-specific affin- 
ities do need to be understood and alternative 
explanations found. However, in our view (49), 
they do not indicate that the eukaryote genome 
arose as a mosaic pieced together from archaeal 
and bacterial genomes. 

It is an attractively simple idea that a 
primitive eukaryote took up the endosymbiont/ 
mitochondrion by phagocytosis (3, 4, 43). A 
unicellular raptor with a larger, more complex 
cell structure than that of present-day prokary- 
otes is envisioned as the host of the ancestral 
endosymbiont. This scenario, which is not con- 
tradicted by new data derived from comparative 
genomics and proteomics, is a suitable starting 
point for future work. Acquisition of genome 
sequences from free-living eukaryotes among 
basal lineages is a high priority. 
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